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Abstract — Hyperspectral  imaging  is  an  important  tool  in 
various  fields,  notably  geosensing  and  astronomy,  and  with  the 
development  of  new  devices,  it  is  now  also  being  applied  also  in 
medicine.  Concepts  and  tools  from  signal  processing  and  data 
analysis  need  to  be  employed  to  analyze  these  large  and  complex 
data  sets. 

In  this  paper,  we  present  several  techniques  which  generally 
apply  to  hyperspectral  data,  and  we  use  them  to  analyze  a 
particular  data  set. 

With  light  sources  of  increasingly  broader  ranges,  spectral 
analysis  of  tissue  sections  has  evolved  from  2  wavelength  im¬ 
age  subtraction  techniques  to  Raman  near  infra-red  micro- 
spectroscopic  analysis  permitting  discrimination  of  cell  types  and 
tissue  patterns.  We  have  developed  and  used  a  unique  tuned  light 
source  based  on  micro-optoelectromechanical  systems  (MOEMS) 
and  applied  algorithms  for  spectral  microscopic  analysis  of 
normal  and  malignant  colon  tissue.  We  compare  the  results  to  our 
previous  studies  which  used  a  tunable  liquid  filter  light  source. 

i.  Introduction 

Hyperspectral  imaging  is  an  important  tool  in  various  fields, 
notably  geosensing  and  astronomy,  and  with  the  development 
of  new  devices,  it  is  now  being  applied  also  in  medicine.  Tools 
from  signal  processing  and  data  analysis  need  to  be  employed 
to  analyze  these  large  and  complex  data  sets. 

A  first  important  challenge  is  to  compress  and  reduce  the  di¬ 
mensionality  of  the  data  available,  without  discarding  relevant 
information.  Depending  on  the  data  and  any  assumption/model 
for  it,  one  can  employ  different  signal  processing  techniques 
in  order  to  efficiently  compress  the  data. 

The  second  challenge  is  usually  a  classification  or  a, regres¬ 
sion  task.  It  is  important  to  look  for  features  which  enhance 
discrimination  among  different  classes  or  which  serve  as  good 
inputs  for  some  regression  algorithm. 

In  this  paper  we  present  various  techniques  from  signal 
processing,  wavelet  analysis,  spectroscopy  and  data  analysis 


in  general  that  we  consider  to  be  useful  in  the  study  of 
hyperspectral  data.  We  then  combine  these  a  selection  of 
these  tools  to  obtain  an  automated  classification  algorithm 
for  a  dataset  of  hyperspectral  images  of  stained  norma!  and 
malignant  colon  tissues. 

The  application  of  hyperspectral  imaging  to  medicine,  and 
pathology  in  particular,  while  not  new,  is  becoming  more 
widespread  and  powerful.  With  light  sources  of  increasingly 
broader  ranges,  spectral  analysis  of  tissue  sections  has  evolved 
from  2  wavelength  image  subtraction  techniques  to  hyper¬ 
spectral  analysis.  A  variety  of  proprietary  spectral  splitting 
devices,  including  prisms  and  mirror  [  1  ],  interferometers  [2], 
[3],  variable  interference  filter-based  monochromometers  [4] 
and  tuned  liquid  crystals  [5],  mounted  on  microscopes  in 
combination  with  CCD  cameras  and  computers  have  been 
used  to  discriminate  among  cell  types,  tissue  patterns  and  en¬ 
dogenous  and  exogenous  pigments  [6],  The  increasing  power 
of  these  methods  holds  promise  for  developing  automatic 
diagnostics,  though  the  increased  volume  of  the  data  collected 
requires  more  efficient  algorithms  to  analyze  the  data  in  a 
short  time.  Moreover,  in  addition  to  amount  of  data  collected, 
the  dimensionality  of  such  data  has  increased  dramatically, 
which  makes  the  extraction  of  statistically  useful  and  reliable 
information  much  harder. 

We  use  a  unique  prototype  tuned  light  source  (from  Plain 
Sight  Systems),  based  on  a  digital  mirror  device  (DMD),  to 
collect  hyperspectral  images  of  normal  and  neoplastic  samples 
from  tissue  microarrays.  Analysis  of  the  data  is  done  with 
a  combination  of  algorithms  from  the  fields  of  spectroscopy, 
chemometry  and  signal  processing.  The  goal  is  to  evaluate  the 
diagnostic  efficiency  of  hyperspectral  microscopic  analysis  of 
normal  and  neoplastic  colon  biopsies  prepared  as  microarray 
tissue  sections  [7]— [  10], 
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11.  Signal  Processing  and  Data  Analysis 

We  introduce  a  number  of  techniques  and  tools  of  general 
applicability  in  dealing  with  hyperspectral  data.  The  specifics 
of  the  algorithm  applied  in  the  present  situation  will  be  detailed 
in  a  later  section. 

A.  Compression  and  De-nosing 

In  many  important  applications,  such  as  astronomy  or 
medical  imaging,  there  is  a  great  amount  of  a  priori  knowledge 
on  the  type  of  hyperspectral  data  at  hand.  For  example  there 
are  good  models  for  each  layer  /;(a;,  y ),  and  for  each  spectrum 
s{x,y)  =  I.(xpy).  Each  /,;  is  an  image  to  which  standard 
image  compression  and  denoising  techniques  can  be  applied. 
Among  these,  wavelet  and  wavelet  packet  [1 1]  techniques 
are  now  classical  and  have  been  proven  to  be  very  effective. 
Novel  multiscale  techniques,  especially  curvelets  and  ridgelets 
[12] -[14]  are  also  vety  promising  and  have  been  applied  to 
astronomical  imaging  [15].  All  of  these  techniques  can  be 
applied  to  each  layer  I,  for  compression  or  denoising  of  the 
layer. 

On  the  other  hand,  each  spectrum  s(x,y)  is  naturally 
expected  to  be  a  smooth  function,  hence  amenable  to  one¬ 
dimensional  compression  and  denoising  techniques.  These 
could  be  Fourier-transform  based  or,  better,  wavelet  based. 

Another  possibility  is  to  do  a  full  3-D  analysis,  by  using 
for  example  3-D  wavelet  packets,  or  more  recent  tools  such 
as  3-D  curvelets.  On  our  dataset,  for  example,  3-D  wavelet 
packet  based  denoising  performed  very  well. 

B.  Dimensionality  reduction 

The  problem  or  reducing  the  dimensionality  of  the  data, 
while  preserving  relevant  structures  of  the  data,  is  extremely 
important  and  has  received  a  great  amount  of  attentionin 
the  last  few  years  from  many  researchers  across  different 
disciplines. 

To  be  concrete,  let  us  consider  the  example  of  the  hyper¬ 
spectral  images  at  hand.  Each  spectrum  is  a  point  in  128- 
dimensional  space.  If  we  look  at  the  ensemble  of  all  spectra  in 
a  data  cube,  or  even  in  all  data  cubes,  we  do  not  expect  them  to 
be  scattered  in  128  dimensions.  In  fact,  we  could  easily  argue 
that  they  should  be  highly  concentrated  around  a  submanifold 
of  low  intrinsic  dimension.  For  example,  one  parameter  may 
be  the  total  energy  of  the  spectrum,  and  two  others  could  the 
absorbance  of  hematoxylin  and  eosin  stains,  respectively,  at 
particular  wavelengths.  These  three  parameters  already  would 
contain  most  of  the  information  about  each  spectrum,  and  in 
fact  this  is  more  or  less  the  only  information  that  a  pathologist 
has,  looking  at  one  of  the  samples  in  the  microscope.  It  just 
happens  that  we  are  measuring  128  numbers,  but  far  fewer 
parameters  would  sufficient  to  identify  a  spectrum.  The  natural 
questions  are  then:  how  do  we  discover  the  parameters  and 
how  many  do  we  really  need? 

We  will  describe  a  few  of  the  many  approaches  suggested 
to  solve  this  problem,  broadly  subdividing  them  into  linear 
and  nonlinear  techniques.  Linear  techniques  in  general  project 
the  data  on  some  low-dimensional  subspace,  so  that  impor¬ 
tant  features  of  the  data  are  preserved,  where  the  measure 


of  importance  has  to  be  defined,  and  is  often  application- 
specific.  Linear  techniques  include  random  projections,  princi¬ 
pal  component  analysis,  partial  least  squares,  several  variations 
of  these,  and  many  others.  In  the  first  two  techniques,  the 
important  features  of  the  data  that  one  seeks  to  preserve  are 
essentially  the  pairwise  distances,  in  the  third  a  function  (e.g. 
labels)  on  the  the  data  is  given  and  is  taken  into  account  in 
the  computation  of  the  subspace. 

I)  Local  Discriminant  Bases:  Local  Discriminant  Bases 
(LDB)  of  Coifman  and  Saito  [16],  [17]  apply  naturally  to 
a  family  of  labeled  vectors  that  represent  smoothly  varying 
functions,  for  example  spectra  and  sounds.  The  labels  of  these 
vectors  may  correspond  to  more  or  less  well-defined  clusters 
in  the  data,  though  determining  those  labels  via  clustering  or 
other  unsupervised  non-linear  separation  methods  can  be  very 
expensive,  if  not  unfeasible,  due  to  the  high  dimensionality 
of  the  vectors.  The  goal  of  LDB  is  to  find  directions  in  these 
high  dimensional  spaces  such  that  the  data  projected  onto  these 
directions  are  still  well-discriminated.  Then  discriminating  the 
low  dimensional  projections  of  the  data  should  be  almost 
as  good  as  discriminating  in  high  dimension  with  all  the 
advantages  and  tools  available  in  lower  dimensional  spaces. 
At  the  same  time,  while  discriminating  features  are  preserved, 
confounding  features  are  removed,  thus  denoising  the  data 
with  respect  to  the  discrimination  task  at  hand. 

The  search  for  features  in  high  dimensional  spaces  is 
notoriously  difficult.  One  way  LDB  alleviate  some  aspects 
of  the  “curse  of  dimensionality”  is  by  searching  sub-optimal 
projections  among  hierarchically  well-organized  dictionaries 
of  wavelet  or  Fourier  packets.  There  are  fast  algorithms  with 
which  perform  such  a  search  and  to  compute  the  projections 
onto  ensembles  of  these  patterns.  We  use  a  version  of  LDB 
that  uses  arbitrary  Haar  packet  decompositions  of  the  phase- 
space,  but  other,  even  less  flexible,  wavelet  dictionaries  would 
work  as  well. 

Nonlinear  techniques  include  local  linear  embeddinge 
(LLE)  [18],  Laplacian  Eigenmaps  [19],  Hessian  Eigenmaps 
[20]  and  Diffusion  maps  [2 1]— [27],  which  together  have  re¬ 
ceived  a  lot  of  attention  in  the  last  few  years.  Many  of  these 
techniques  are  based  on  the  idea  that  the  data  lies  on  some 
manifold  in  a  high  dimensional  space,  but  with  the  intrinsic 
dimensionality  of  the  manifold  actually  being  quite  low  due 
to  constraints  in  the  data  allowing  for  a  description  by  few 
parameters. 

Here  we  would  like  to  illustrate  the  use  of  Principal  Com¬ 
ponent  Analysis  and  Diffusion  Maps  applied  to  this  particular 
dataset.  Similar  results  would  be  expected  in  the  analysis  of 
other  types  of  hyperspectral  data,  for  example  astronomical 
hyperspectral  data. 

We  consider  a  data  cube  with  the  spectra  centered  around 
their  mean,  and  we  compute  the  principal  components  of  the 
centered  spectra  contained  in  the  cube.  This  is  computationally 
quite  expensive,  so  in  practice  we  select  a  random  subset  of 
spectra  and  we  compute  the  principal  components  for  that 
subset.  We  immediately  discover  that  the  top  20  principal 
components  capture  almost  95%  of  the  energy  of  the  data. 
In  particular,  inner  products  and  pair-wise  Euclidean  distances 
could  be  computed  on  the  projection  onto  the  top  few  principal 


components  with  very  good  precision  (and  less  sensitivity  to 
noise!).  The  projection  onto  the  principal  components,  when 
viewed  layer  by  layer,  is  represented  in  Figure  1.  Despite 
eliciting  some  structure  within  the  data,  the  projections  are 
not  particularly  informative. 

If  the  low-dimensional  set  on  which  the  spectra  actually  lie 
is  quite  nonlinear,  in  general  there  will  not  be  a  linear  subspace 
onto  which  the  data  can  be  meaningfully  projected.  So  while 
the  principal  components  analysis  does  show  that  the  intrinsic 
dimensionality  of  the  spectral  data  is  rather  small,  it  does  not 
help  in  extracting  good  parameters  and  understanding  clusters 
in  the  spectral  space. 

We  adopt  a  nonlinear  technique  based  on  diffusion  [26], 
[27]  in  order  to  better  understand  the  structure  of  the  data. 
Instead  of  looking  at  the  directions  of  maximum  variability, 
as  principal  component  analysis  does,  this  technique  looks  at 
each  spectrum,  and  at  the  connections  between  each  spectrum 
with  its  very  closest  neighbors.  It  then  looks  at  how  these 
connections  allow  a  random  walker  to  explore  the  data. 
The  idea  is  that  the  connections  inside  each  cluster  will  be 
numerous  and  strong,  and  connections  across  clusters  will 
be  fewer  and  weaker.  It  is  then  possible  to  construct  a  map 
from  spectral  space  to  Euclidean  space  such  that  the  Euclidean 
distance  between  two  points  measured  in  the  range  is  equal 
to  the  “diffusion  distance”  between  those  two  points  on  the 
original  data  set.  Moreover,  this  map  has  the  form 

s  >->  ($i(*),  $2 (at).  •  •  • ,  $k(x)X  , 

where  the  functions  <1>  are  defined  on  the  set  of  spectra,  and 
are  eigenfunctions  of  a  Laplacian  defined  on  the  set  of  spectra 
itself,  interpreted  as  a  graph.  Details  and  discussion  of  these 
ideas  can  be  found  in  references  [19],  [22],  [23]. 

When  we  apply  this  technique  to  the  spectra  in  a  data  cube, 
we  get  a  much  more  meaningful  descriptions  of  the  data,  and 
in  fact  various  eigenfunctions  <E>£  separate  very  well  between 
different  tissue  types.  This  is  a  consequence  of  the  staining, 
which  we  had  reasonably  expected  as  being  one  of  the  most 
important  parameters  (see  Figure  2  and  3). 

The  parameters  discovered  with  this  algorithm  allow  one  to 
“virtually  stain”  the  biopsy,  and  could  be  mapped  from  biopsy 
to  biopsy  in  order  to  resolve  normalization  issues  that  greatly 
affect  global  distances  between  points. 

This  technique  can  be  used  effectively  for  segmentation  of 
the  data  cube.  Spectral  features,  together  with  spatial  features 
(for  example  filter  responses  to  various  texture  or  edge  filters) 
can  be  clustered  using  the  eigenvectors  of  the  diffusion  process 
on  these  features,  and  effectively  find  clusters  corresponding 
to  segmentations  of  the  data  cube.  See  for  example  [28],  [29], 

C.  Classification  and  Regression  Techniques 

The  goal  of  the  analysis  of  a  set  of  hyperspectral  images 
may  be  classification  or  regression.  Depending  on  the  appli¬ 
cation  and/or  goal,  one  may  want  to  classify  single  spectra, 
or  groups  of  spectra  around  particular  locations.  For  example, 
in  astronomy  one  may  want  to  classify  galaxy  types  based  on 
their  spatial  configuration  and  spectral  characteristics.  In  our 
example,  we  want  to  discriminate  between  normal  nuclei  and 
abnormal  (malignant)  nuclei  in  various  regions  of  the  tissue. 


In  general,  seeking  features  in  the  full  3-dimensional  data 
cube  can  lead  to  the  best  results,  since  the  various  spatial  and 
spectral  correlations  help  in  denoising  the  data,  and  can  be 
used  to  define  features  of  local  aggregates  which  can  be  much 
more  meaningful  than  features  of  a  single  spectrum. 

Without  giving  a  full  discussion  of  all  details,  we  present 
here  the  techniques  that  will  be  the  building  blocks  of  the  final 
algorithm. 

1)  Nearest  Neighbor  Classifier:  Given  a  set  of  points  {x,}, 
with  corresponding  labels  {/;}.,  and  given  a  test  point  y,  the  k- 
nearest  neighbor  classifier  assigns  the  label  /  ,;*  to  y  as  follows. 
The  k  closest  points  {x£, , . . . ,  xik }  to  y  are  found.  Then  the 
most  frequent  label  among  {/tl , . . .  ,l,t  }  is  assigned  to  y. 
Ties  are  broken  randomly.  The  1-ncarest  neighbor  classifier  has 
many  good  theoretical  properties,  and  performs  extremely  well 
when  the  number  of  training  points  is  large.  When  the  number 
of  points  is  small  for  the  dimension  in  which  the  points  are 
given,  then  fc-nearest  neighbor  classifiers  with  k  >  1  may  be 
preferable  since  the  choice  k  >  1  corresponds  to  regularizing 
the  data  in  a  particular  way. 

2)  Partial  Least  Squares:  Given  a  set  of  points  {a;.};  and  a 
function  /  defined  on  these  points,  and  an  integer  k  >  0,  PLS 
[30]— [34]  computes  a  set  of  orthonormal  vectors  {?>  i, ....  t;*.}, 
and  a  ^-dimensional  vector  w,  and  then  extrapolates  /  at  y  by 
first  computing  P(y),  the  projection  of  y  onto  the  subspace 
spanned  by  {vi, . . .  ,Vk}  and  then  letting  f(y)  —<  y.w  >. 
The  computation  of  the  vectors  v  i , . . . ,  vr  is  done  in  the 
following  way.  Once  the  first  i  vectors  Vi, . . .  ,ty  have  been 
constructed,  vi+l  is  the  vector  that  solves  the  problem 

max  Corr2  /,  Y' (ifixt  Var  Y' (vfixi 
!MI=>  I  / 

where  (v)i  denotes  the  Z-th  coordinate  of  v,  and  where  for 
i  =  0  we  consider  the  maximization  , over  all  v  of  norm  1.  Once 
the  k  vectors  have  been  found,  the  data  is  projected  onto 
the  subspace  spanned  by  these  vectors  and  linear  regression  is 
used  in  that  subspace.  The  projection  has  the  goal  of  denoising 
the  data  but  in  such  a  way  as  to  preserve  directions  that  have 
strong  linear  correlation  with  the  function  to  be  predicted. 


III.  Experimental  Design 

A.  Platform 

The  prototype  tuned  light  source  [35]  transilluminates 
hematoxylin  and  eosin  (FI  &  E)  stained  micro-array  tissue  sec¬ 
tions  with  arbitrary  combinations  of  light  frequencies,  ranging 
from  about  440  nm  to  about  700  nm,  through  a  Nikon  Biophot 
microscope.  To  collect  the  initial  data  we  used  the  flexibility 
of  the  light  source  to  implement  a  randomized  version  of 
the  standard  Hadamard  multiplexing  for  spectra  measurement, 
in  order  to  reduce  noise  and  biases  in  the  signal-to-noise 
ratios  of  the  collected  data.  Hyperspectral  tissue  images  are 
collected  with  a  CCD  camera  (Sensovation)  and  are  analyzed 
mathematically  with  a  PC,  using  algorithms  written  in  Matlab. 
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Fig.  !.  Top  left:  The  first  few  principal  components  of  the  spectra  of  a 
sample,  top  right  and  bottom  are  the  coefficients  of  the  projection  of  each 
spectrum  onto  the  first,  second  and  third  principal  components. 


Fig.  2.  Moving  left  to  right,  top  to  bottom,  the  eigenfunctions  ^  of  the 
diffusion  on  the  data  set  of  spectra,  for  k  =  1,2, 3, 4,  5, 6  respectively,  are 
mapped  to  the  colors. 


Fig.  3.  The  vector  (y>i (x), <P2{x),  93(x))  is  normalized  and  mapped  onto 
RGB  colors. 


B.  Image  source 

137  (66  normal,  71  malignant)  hyperspectral  gray  scale 
images  at  400x  magnification  are  derived  from,  respectively, 
58  and  62  different  tissue  microarray  biopsies. 

C.  Data  Acquisition 

Each  measurement  yields  a  data  cube  C,  which  is  a  set 
128  of  images,  each  of  which  495  by  656  pixels.  The 
intensity  of  the  pixel  7,  (x,  y)  ideally  represents  the  transmitted 
light  at  location  (x,y)  when  the  i-th  light  pattern  is  shone 
through  the  sample.  The  measurement  of  the  hyperspectral 
image  is  subject  to  noise,  which  is  roughly  independent  of 
the  intensity  of  light  shown  through  the  sample.  In  order  to 
maximize  the  signal-to-noise  ratio  of  the  measurement  of  each 
li,  given  a  fixed  integration  time,  one  needs  to  maximize  the 
amount  of  light  shone  through  the  sample.  The  flexibility  of 
the  instrument  allows  for  shining  arbitrary  patterns  ipi  of  light, 
in  the  form 

N 

!;?,(/./)  (1) 

j=l 

where  f,  e  {0,1}  and  5,  represents  approximately  a  <5-fimction 
at  frequency  of  index  i  (and  (V  =  128  in  our  experiment,  but 
the  instrument  would  allow  up  to  N  =  1024), 

Hence  we  can  think  of  Ii(x,y)  as  the  value  of  the  inner 
product 

<  f{xyy,v),i>i{v)  >„  , 

where  f(x,  y,  v)  is  the  transmittance  of  the  sample  at  location 
(x,  y)  and  frequency  u. 

A  raster  scan  consists  in  shining  the  sequence  { i/ ' i }  =  {(’h}. 
In  this  case  the  energy  of  light  shone  through  for  each  /,  will 
be  of  the  order  of 

Eq 
N  ’ 

E0  being  the  intensity  of  the  light  source.  Hence,  reasonable 
signal-to-noise  ratios  could  be  obtained  only  by  integrating  for 
a  long  time. 

Multiplexing  allows  a  much  faster  scan,  for  a  given  signal- 
to-noise  ratio,  and  consists  in  shining  a  sequence  of  Hadamard 
patterns  These  patterns  have  the  property  that 

for  each  i  there  are  y  non-zero  f,;’s  in  (1)  (so  that  the  energy 
of  the  light  shining  through  the  sample  is  about  y1  for  the 
measurement  of  each  /,),  and  also  these  patterns  are  quite 
independent.  These  patterns  have  a  multiscale  structure,  in  the 
sense  that  the  index  set  {1, . . . ,  N}  can  be  split  into  subsets 


i 

j 
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Fig.  4.  Examples  of  Hadamard  patterns  (left),  and  randomized  Hadamard 
patterns  (right). 


Fig.  5.  Example  of  expansion  of  a  smooth  spectrum  on  standard  Hadamard 
patterns  (left)  and  on  random  Hadamard  patterns  (right). 

{ Ji, . . . ,  Jiog  jv}  such  that  the  patterns  in  each  subsets  are 
constant  on  dyadic  intervals  at  a  certain  scale.  However,  it 
turns  out  that  in  this  way  the  signal-to-noise  ratio  is  not 
uniformly  distributed.  This  is  a  consequence  of  the  smoothness 
of  the  spectra  to  be  measured  and  of  the  structure  of  the 
system  of  Hadamard  functions,  which  implies  a  priori  a 
decay  of  |  <  /,  il>^  >  |  as  a  function  of  j.  To  spread 
the  signal-to-noise  ratio  uniformly  among  the  coefficients,  we 
consider  randomized  Hadamard  functions,  which  we  obtain  by 
building  a  random  bijection  m  :  {1, . . . ,  N}  — *■  {1, . . . ,  N} 
and  considering  {u)  =  \pH  (m{y)).  We  compute  this 
random  bijection  once  and  use  the  induced  shuffling  in  all 
of  our  measurements.  Of  course,  the  change  of  variable  m 
simply  induces  an  orthogonal  transformation  between 
and  {ip^H}.  In  Figure  6  we  show  a  collected  spectrum  and 
the  corresponding  physical  spectrum.  It  is  clear  how  the  size 
of  all  the  collected  coefficients  {<  >}  is  roughly 

constant  in  i.  The  indices  2  and  4  are  special  since  the 
corresponding  mirror  patterns  are  set  entirely  to  0  in  order 
to  measure  background. 

IV.  Algorithm 

The  algorithm  aims  at  discriminating  between  normal  and 
abnormal  data  cubes.  In  fact,  it  would  be  more  useful  to  be 
able  to  classify  normal  and  abnormal  (malignant)  regions  in 
each  sample.  This  would  be  particularly  important  in  order  to 
be  able  to  spot  abnormal  (malignant)  regions,  which  are  small 
and/or  only  partially  present  in  the  sample  in  question. 

The  way  a  trained  pathologist  would  work  in  analyzing 
these  samples  is  mainly  through  pattern  recognition.  He  would 
look  for  characteristic  structures  of  large  ensembles  of  cells, 
such  as  the  structure  of  glands,  their  shape,  size,  arid  to  smaller 
details  such  as  the  shape,  size,  density  and  granularity  of  the 


Fig.  6.  A  typical  physical  spectrum 


Fig.  7.  Two  different  spectral  slices  of  a  normal  sample. 

nuclei.  This  kind  of  pattern  analysis  is  mainly  based  on  rather 
large  scale  features,  and  it  could  yield  inaccurate  results  on 
smaller  regions. 

Our  algorithm  will  generate  a  classifier  for  square  regions, 
or  patches,  with  edges  of  a  certain  length  l  which  are  “admis¬ 
sible”,  in  the  sense  that  contain  a  certain  density  of  nuclei,  as 
specified  below.  Each  data  cube  or  sample  will  contain  several 
such  “admissible”  patches,  each  of  them  roughly  centered 
around  a  nucleus,  and  of  size  about  the  size  of  the  nucleus. 
Each  patch  can  be  viewed  as  a  cloud  of  l 2  spectral  vectors  in 
R.128,  We  will  then  classify  a  whole  slide  by  voting  among 
the  classifications  of  the  patches  in  that  slide. 

We  naturally  divide  the  algorithm  into  the  following  build¬ 
ing  blocks: 

(1)  Nuclei  identification 
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(2)  Collection  of  admissible  patches. 

(3)  Construction  of  the  classifier  based  on  the  mean  of  the 
nuclei  spectra  in  each  patch. 

Step  /.  Nuclei  identification  The  first  task  is  to  extract  the 
nuclei  spectra  from  a  data  cube.  This  is  essentially  a  tissue 
classification  task,  and  we  expect  this  to  be  easy  since  the 
H&E  stain  used  for  the  preparation  of  the  slides  differentiates 
between  nuclei  and  the  other  tissue  components.  We  seek 
spectral  signatures  that  allow  one  to  discriminate  the  spectra 
of  nuclei  from  all  the  other  spectra.  In  order  to  do  this,  we 
selected  about  3, 000  spectra  from  two  distinct  datacubes, 
about  one  third  of  which  belonged  to  each  of  three  different 
classes:  {nuclei,  cytoplasm,  lamina  propria/other}.  Let  these 
samples  be  denoted  by 


{  {tV./.  }?:  }/£  {nuclei, cytoplasm, lamina  propria}  W  R. 


128 


We  normalize  this  set  of  spectra  so  that  each  spectrum  has 
£2-norm,  or  energy,  equal  to  1: 

Vi, i 


Wc  then  used  LDB  on  v,. y  to  find  features  that  best 
discriminate  among  the  different  classes.  We  found  that  4 
spectral  signatures  (see  Figure  9)  are  enough  to  discriminate 
among  the  various  tissue  types,  in  particular  they  are  enough 
to  discriminate  well  the  nuclei  spectra  from  all  the  others. 
We  project  our  (normalized)  training  set  onto  these  4  features, 
see  Figure  10,  and  to  classify  a  spectrum  from  any  data  cube 
we  normalize  it  and  project  onto  these  4  features.  On  this 
projection,  we  use  a  15-nearest-neighbor  classifier  to  identify 
to  which  of  the  three  classes  the  spectrum  belongs.  Notice 
that  the  dimensionality  reduction  has  a  de-noising  effect  on 
the  spectra,  thus  regularizing  the  distance  computations  used 
by  the  nearest-neighbor  algorithm  used  to  classify  in  the  ap¬ 
propriate  low-dimensional  subspace.  Let  us  denote  by  C uSSue 
the  classifier  that  computes  this  projection  and  classifies  into 
tissue  types  as  just  described.  The  performance  of  the  classifier 
Ct.is.?ue  is  quite  good,  uniformly  over  all  datacubes.  Mistakes 
are  isolated  and  can  be  easily  removed  by  voting  among 
the  spatial  (x,  y )  neighbors.  From  now  on  we  declare  that  a 
spectrum  is  a  nucleus  spectrum  if  it  is  classified  as  a  nucleus 
spectrum  by  CWue- 

It  is  important  to  remark  at  this  point  that  the  instrument  is 
able  to  directly  measure  the  projection  of  the  spectrum  onto 
the  LDB  light  patterns  by  shining  exactly  these  4  patterns  of 
light  through  the  sample.  The  results  of  these  measurements 
can  be  provided  immediately  to  the  nearest  neighbor  classifier. 
This  saves  the  millions  of  CPU  operations  necessary  to  project 
the  data  onto  these  features.  The  flexibility  of  the  device 
essentially  allows  one  to  move  these  computations  from  the 
computer  to  the  instrument  itself,  essentially  performing  an  ad 
hoc  experiment  that  measures  exactly  the  quantities  of  interest. 

Step  2.  Collection  of  admissible  patches 

Now  that  the  nuclei  spectra  are  identified,  we  define  the 
patches  we  want  to  classify  as  follows.  A  patch  is  a  subset  of 
a  datacube  of  the  form  QlXOtyo  x  S  where  Q'xo,yo  is  a  square 
of  side  l  pixels  long,  centered  at  (xoPyo)  and  S  denotes  the 
complete  spectral  range.  A  patch  is  admissible  if  it  contains  at 
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Fig.  9.  Top  4  discriminant  vectors  for  tissue  classification  found  by  LDB 


Fig.  10.  Projection  of  the  training  set  for  tissue  type  classification  onto  its  top 
3  LDB  vectors.  Red,  green  and  blue  denote  respectively  nuclei,  cytoplasm, 
lamina  propria. 


least  fgl2  nuclei  pixels.  From  now  on  we  will  consider  each 
patch  simply  as  a  collection  of  the  nuclei  spectra  it  contains, 
hence  as  a  cloud  in  R|s|  (with  |5|  =  128  in  our  specific  case). 

We  considered  different  sets  of  patches,  corresponding  to 
l  =  32,64,  and  128,  and  the  results  improved  with  the  patch 
size.  However,  since  they  are  already  very  good  for  l  =  32 
(this  size  corresponds  roughly  to  the  size  of  a  single  nucleus), 
we  present  here  the  results  corresponding  to  l  =  32. 

The  set  of  patches  we  consider  consists  of  2440  patches  of 
size  l  =  128,  collected  randomly,  30  per  slide.  We  denote  by 
{AfyfcjfcgK,  the  set  of  nuclei  spectra  in  the  i-th  patch  P,. 

Step  2:  Construction  of  the  classifier  based  on  the  mean  of 
the  nuclei  spectra  in  each  patch 

For  each  admissible  patch  Pt  collected,  we  compute  the 
mean  of  the  nuclei  spectra  {N^k}k  and  we  normalize  it  to  unit 
energy.  The  label  (normal  or  abnormal)  attached  to  the  patch 
is  transferred  to  the  corresponding  mean  nucleus  spectrum.  We 
used  PLS,  keeping  k:  =  15  top  vectors,  and  we  ran  50  rounds 
of  10-fold  cross-validation  to  make  sure  we  are  not  overfitting. 
The  confusion  matrices  of  the  classifiers  thus  obtained  are 
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Fig.  1 1 .  Two  examples  of  tissue  classification. 


Fig.  13.  Classification  of  nuclei  patches  of  normal  samples:  green,  blue 
and  red  mean  classified  respectively  as  normal,  not-classified,  abnormal 
(malignant). 


summarized  in  table  I  for  classifiers  of  patches  of  size  32. 

We  tried  various  normalizations  of  the  spectra  (energy, 
baseline)  and  various  subband  selections  and  wavelet  packet 
compression  techniques  for  dimensionality  reduction  before 
applying  PLS,  and  to  check  the  stability  of  the  algorithm,  but 
all  these  attempts  gave  very  similar  results.  This  is  due  to 
the  large  number  of  samples  available  for  training,  even  under 
cross-validation. 


V.  Classification  of  biopsies 
To  classify  biopsies,  we  collect  several  admissible  random 
patches  from  the  biopsy,  and  classify  each  of  them.  A  biopsy 
is  considered  normal  if  the  majority  of  patches  are  classified 
as  normal,  and  the  biopsy  is  deemed  malignant  if  a  minimum 
number  (fixed  and  validated  under  cross-validation)  of  patches 
is  deemed  abnormal.  Of  course,  more  conservative  choices 
could  be  made,  depending  on  the  weight  that  is  chosen  for 
biopsies  classified  as  false  positives  or  false  negative.  Since 


Fig.  12.  Local  average  of  nuclei  speclra,  with  1 -standard  deviation  bars. 
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TABLE  II 

Confusion  Matrix,  cross-validated,  of  predictions  of 
CARCINOMA  VS.  NORMAL  ON  136  BIOPSIES,  BY  MAJORITY  VOTE  OF  THE 
CLASSIFIER  ON  32  BY  32  NUCLEI  PATCHES. 
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Fig,  14.  Classification  of  nuclei  patches  of  abnormal  (malignant)  samples: 
green,  blue  and  red  mean  classified  respectively  as  normal,  not-classified, 
abnormal  (malignant). 


Fig,  15.  Classification  result  for  the  various  biopsies,  represented  as  mean 
classification  of  the  nuclei  patches.  The  separation  between  normals  (left, 
around  +1)  and  malignant  (right,  around  -1)  is  sharp. 


the  classification  of  nuclei  patches  is  quite  accurate,  one 
could,  for  example,  conservatively  call  a  slide  malignant  if  a 
minimum  number  m  (e.g.  10)  of  nuclei  patches  are  classified 
as  malignant.  In  our  dataset  this  did  not  make  any  difference 
in  the  classification  of  the  slides,  for  most  reasonable  values 
of  m  (e.g.  m  >  10).  In  any  case,  no  matter  which  criterion  of 
voting  among  the  classified  patches  we  choose,  we  have  no 
errors  in  classifying  each  data  cube  (biopsy). 


TABLE  1 

Confusion  Matrix  of  predictions  of  carcinoma  vs.  normal 
NUCLEI  PATCHES  OF  SIZE  32  BY  32,  AVERAGE  10-FOLD  CROSS-VALIDATED 


ERROR. 

Sensitivity 

93.5% 

Specificity 

93.5% 

TP 

TN 

False  Negative 

6.6% 

pp 

94.0% 

7.3% 

False  Positive 

6.6% 

PN 

6.0% 

92.7% 

Pred  Val  Pos 

94.1% 

Pred  Val  Neg 

92.7% 

Diag  Eff 

93.4% 

VI.  Conclusions 

Hyperspectral  imaging  presents  many  challenges  from  the 
point  of  view  of  data  and  signal  analysis  due  to  the  large 
amount  and  high-dimensionality  of  the  data.  In  this  paper  we 
present  some  tools  from  statistical  data  analysis,  multiscale 
signal  processing  and  nonlinear  dimensionality  reduction,  that 
are  effective  in  analyzing  hyperspectral  data.  We  demonstrate 
a  successful  (100%  diagnostic  efficiency)  application  of  a 
combination  of  these  techniques  to  the  problem  of  automati¬ 
cally  discriminating  between  normal  and  abnormal  (malignant) 
hyperspectral  images  of  colon  biopsies.  Further  research  will 
involve  abnormal  but  non-malignant  samples,  and  discrimi¬ 
nating  between  these  and  the  malignant  abnormal  samples. 
Discriminating  between  these  and  the  malignant  abnormal 
samples  can  permit  automatic  diagnosis. 
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